
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <title>Load Data &#8212; PixieDust Documentation</title>
    <link rel="stylesheet" href="_static/better.css" type="text/css" />
    <link rel="stylesheet" href="_static/pygments.css" type="text/css" />
    <link rel="stylesheet" href="_static/custom.css" type="text/css" />
    <script type="text/javascript">
      var DOCUMENTATION_OPTIONS = {
        URL_ROOT:    './',
        VERSION:     '1.0',
        COLLAPSE_INDEX: false,
        FILE_SUFFIX: '.html',
        HAS_SOURCE:  true,
        SOURCELINK_SUFFIX: '.txt'
      };
    </script>
    <script type="text/javascript" src="_static/jquery.js"></script>
    <script type="text/javascript" src="_static/underscore.js"></script>
    <script type="text/javascript" src="_static/doctools.js"></script>
    <link rel="shortcut icon" href="_static/pd_icon.ico"/>
    <link rel="index" title="Index" href="genindex.html" />
    <link rel="search" title="Search" href="search.html" />
    <link rel="next" title="Display Data" href="displayapi.html" />
    <link rel="prev" title="Install PixieDust" href="install.html" />
  <meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=1">
  </head>
  <body>
    <header id="pageheader"><h1><a href="index.html ">
        PixieDust Documentation
    </a></h1></header>  

    <div class="document">
      <div class="documentwrapper">
        <div class="bodywrapper">
          <div class="body" role="main">
            
  <div class="section" id="load-data">
<h1>Load Data<a class="headerlink" href="#load-data" title="Permalink to this headline">¶</a></h1>
<div class="section" id="sample-data-sets">
<h2>Sample Data Sets<a class="headerlink" href="#sample-data-sets" title="Permalink to this headline">¶</a></h2>
<p>PixieDust comes with sample data. To start playing with the display() API and other PixieDust features, load and then visualize one of our many sample data sets.</p>
<p>To call the list of data sets, run the following command in your notebook:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">pixiedust</span><span class="o">.</span><span class="n">sampleData</span><span class="p">()</span>
</pre></div>
</div>
<p>You get a list of the data sets included with PixieDust.</p>
<a class="reference internal image-reference" href="_images/sample_data_sets.png"><img alt="Screenshot of PixieDust's sampleData() method." src="_images/sample_data_sets.png" style="width: 692.0px; height: 261.0px;" /></a>
<!-- START EXCLUDE --><div class="admonition note">
<p class="first admonition-title">Note</p>
<p>If you get an error, and you’re running Spark 1.6, run the following command to manually install packages missing in 1.6 (You need to do so only once.):</p>
<div class="last highlight-default"><div class="highlight"><pre><span></span><span class="n">pixiedust</span><span class="o">.</span><span class="n">installPackage</span><span class="p">(</span><span class="s2">&quot;com.databricks:spark-csv_2.10:1.5.0&quot;</span><span class="p">)</span>
<span class="n">pixiedust</span><span class="o">.</span><span class="n">installPackage</span><span class="p">(</span><span class="s2">&quot;org.apache.commons:commons-csv:0&quot;</span><span class="p">)</span>
</pre></div>
</div>
</div>
<!-- END EXCLUDE --><p>To create a pySpark DataFrame for one of the samples, just enter its number in the following command. For example, to load Set 6, Million Dollar Home sales, run the command:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">home_df</span> <span class="o">=</span> <span class="n">pixiedust</span><span class="o">.</span><span class="n">sampleData</span><span class="p">(</span><span class="mi">6</span><span class="p">)</span>
</pre></div>
</div>
</div>
<div class="section" id="load-a-csv-using-its-url">
<h2>Load a CSV using its URL<a class="headerlink" href="#load-a-csv-using-its-url" title="Permalink to this headline">¶</a></h2>
<p>You can also replace the number with a URL. If you have a CSV file online, access it by entering the URL in the parentheses, like this:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">home_df</span> <span class="o">=</span> <span class="n">pixiedust</span><span class="o">.</span><span class="n">sampleData</span><span class="p">(</span><span class="s2">&quot;https://openobjectstore.mybluemix.net/misc/milliondollarhomes.csv&quot;</span><span class="p">)</span>
</pre></div>
</div>
</div>
<div class="section" id="load-data-from-your-local-system">
<h2>Load data from your local system<a class="headerlink" href="#load-data-from-your-local-system" title="Permalink to this headline">¶</a></h2>
<p>Loading a CSV from your local file system is equally simple. Drop in the file path, like so:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">pixiedust</span><span class="o">.</span><span class="n">sampleData</span><span class="p">(</span><span class="s1">&#39;file:///Users/bradfordnoble/pixiedust/data/nz.csv&#39;</span><span class="p">)</span>
</pre></div>
</div>
</div>
<div class="section" id="other-data-sources">
<h2>Other Data Sources<a class="headerlink" href="#other-data-sources" title="Permalink to this headline">¶</a></h2>
<p>PixieDust provides these sample data sets as a convenience to help you get started fast. To load or connect to your own data source, follow the steps you normally would from within a notebook. Our team has created some notebook tutorials which show how to connect to Cloudant, Twitter, and other data sources. See: <a class="reference external" href="https://developer.ibm.com/clouddataservices/2016/08/04/predict-flight-delays-with-apache-spark-mllib-flightstats-and-weather-data/">Predict Flight Delays with Apache Spark MLLib, FlightStats, and Weather Data</a>  and  <a class="reference external" href="https://developer.ibm.com/clouddataservices/2015/10/06/sentiment-analysis-of-twitter-hashtags/">Sentiment Analysis of Twitter Hashtags</a></p>
</div>
</div>


          </div>
        </div>
      </div>
      <div class="sphinxsidebar" role="navigation" aria-label="main navigation">
        <div class="sphinxsidebarwrapper">
<h3><a href="index.html">Table Of Contents</a></h3>
<ul class="current">
<li class="toctree-l1 current"><a class="reference internal" href="use.html">Use PixieDust</a><ul class="current">
<li class="toctree-l2"><a class="reference internal" href="install.html">Install PixieDust</a></li>
<li class="toctree-l2 current"><a class="current reference internal" href="#">Load Data</a></li>
<li class="toctree-l2"><a class="reference internal" href="displayapi.html">Display Data</a></li>
<li class="toctree-l2"><a class="reference internal" href="packagemanager.html">Package Manager</a></li>
<li class="toctree-l2"><a class="reference internal" href="scalabridge.html">Use Scala in a Python Notebook</a></li>
<li class="toctree-l2"><a class="reference internal" href="sparkmonitor.html">Spark Progress Monitor</a></li>
<li class="toctree-l2"><a class="reference internal" href="download.html">Download Data</a></li>
<li class="toctree-l2"><a class="reference internal" href="logging.html">Logging</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="develop.html">Develop for PixieDust</a></li>
<li class="toctree-l1"><a class="reference internal" href="pixieapps.html">PixieApps</a></li>
<li class="toctree-l1"><a class="reference internal" href="pixiegateway.html">PixieGateway</a></li>
<li class="toctree-l1"><a class="reference internal" href="releasenotes.html">Release Notes</a></li>
</ul>

<div id="searchbox" style="display: none" role="search">
  <h3>Quick search</h3>
    <form class="search" action="search.html" method="get">
      <div><input type="text" name="q" /></div>
      <div><input type="submit" value="Go" /></div>
      <input type="hidden" name="check_keywords" value="yes" />
      <input type="hidden" name="area" value="default" />
    </form>
</div>
<script type="text/javascript">$('#searchbox').show(0);</script>
        </div>
      </div>
      <div class="clearer"></div>
    </div>
  <footer id="pagefooter">&copy; 2017, IBM.
      Created using <a href="http://sphinx-doc.org/">Sphinx</a>
      1.6.3.

  </footer>

  
  </body>
</html>